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ASYMPTOTIC PROPERTIES OF COVARI ATE- ADAPTIVE 
RANDOMIZATION 

By Yanqing Hu and Feifang Hu 1 

University of Virgina 

Balancing treatment allocation for influential covariates is critical 
in clinical trials. This has become increasingly important as more and 
more biomarkers are found to be associated with different diseases 
in translational research (genomics, proteomics and metabolomics). 
Stratified permuted block randomization and minimization methods 
[Pocock and Simon Biometrics 31 (1975) 103-115, etc.] are the two 
most popular approaches in practice. However, stratified permuted 
block randomization fails to achieve good overall balance when the 
number of strata is large, whereas traditional minimization methods 
also suffer from the potential drawback of large within-stratum im- 
balances. Moreover, the theoretical bases of minimization methods 
remain largely elusive. In this paper, we propose a new covariate- 
adaptive design that is able to control various types of imbalances. 
We show that the joint process of within-stratum imbalances is a posi- 
tive recurrent Markov chain under certain conditions. Therefore, this 
new procedure yields more balanced allocation. The advantages of 
the proposed procedure are also demonstrated by extensive simula- 
tion studies. Our work provides a theoretical tool for future research 
in this area. 

1. Introduction. In clinical trials, covariates are factors that have a large 
impact on the responses of the patients. Typical covariates include gender, 
age, disease stage, different research center, etc. At the design stage it is often 
important to balance treatment allocation over covariates, as a well-balanced 
trial can lead to more efficient statistical comparison and more convincing 
results to the general audience [Kundt (2009)]. Balanced allocation is also 
particularly useful when the sample size is small or when interim analysis 
or subgroup analysis is desired [Toorawa et al. (2009)]. 
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Stratified randomization is a popular way of achieving balance. It defines 
strata as different combinations of the covariates' levels and employs per- 
muted block randomization within each stratum. This method is easy to 
implement and achieves good balance when the number of strata is small 
[Kalish and Begg (1985)]. However, the permuted block design is susceptible 
to selection bias [Matts and Lachin (1988)]. Moreover, it tends to cause se- 
vere allocation imbalance in the whole trial when there are too many strata, 
typically as a result of many covariates, or many levels within the individual 
covariates [Pocock (1982)]. Increasing numbers of strata, however, has be- 
come the trend, due to the need to conduct multicenter trials as well as the 
inclusion of newly identified biomarkers as covariates [Khan et al. (2010), Li 
et al. (2010), Mcllroy et al. (2010), etc.]. 

Covariate-adaptive randomization (or minimization) has been proposed to 
address the above problem. The earliest work on minimization dates back 
to Taves (1974) and Pocock and Simon (1975). In particular, with I be- 
ing the number of covariates and rrii the number of levels for covariate i, 
i = Pocock and Simon's (1975) procedure minimizes a weighted 

average of marginal imbalances '^2 i Widi(n), where di(n) is a measure of im- 
balance among treatment groups with respect to the ith margin of the new 
patient. Simulation studies [Weir and Lees (2003), Toorawa et al. (2009), 
Kundt (2009)] found that this method reduces marginal imbalances as well 
as the overall imbalance. Wei (1978) generalized Taves's method by intro- 
ducing a marginal urn model. Other works include Zelen (1974), Nordle 
and Brantmark (1977), Signorini et al. (1993) and Heritier, Gebski and Fil- 
ial (2005), which used a hierarchical decision rule and set priority among 
elements of strata, margins and overall trial. Despite the numerous works 
in the literature, "very little is known about the theoretical properties of 
covariate-adaptive designs" [Rosenberger and Sverdlov (2008)]. 

Model-based approach was introduced by Begg and Iglewicz (1980) and 
Atkinson (1982), and the theoretical work has been developed by Smith 
(1984a, 1984b). Smith considered the linear model Ey n = T n a + Y^=i z n j/3j 
with homogeneous errors and no interaction of any type, where y n , T n , 
(z nt i, . . . , Zn tP ) are the response, assignment and covariate values of the nth 
patient, respectively, and T n = +1 or —1 for treatment 1 or 2. Since a, 
the treatment effect, is the main interest of the trial, this method sequen- 
tially skews the allocation probability toward the treatment that would lead 
to a smaller variance of a (the MLE of a). Under some appropriate al- 
location functions Smith derived the asymptotic normality of Y27=i z i,j^i 
(j = 1, . . . ,p). This asymptotic property was further applied to the construc- 
tion of a conditional permutation test [Smith (1984b)]. 

Although the minimization approach [Pocock and Simon (1975), Wei 
(1978), etc.] and the model-based approach [Smith (1984a, 1984b), etc.] 
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both lead to marginal and overall balance, they are rather different in na- 
ture. First, even if they use the same biased coin function, the two allocation 
rules are still not the same, unless in the trivial case of no covariates. Hence 
Smith's asymptotic result does not readily apply to Pocock and Simon's or 
Wei's procedure. Second, Smith's result depends on the homogeneous linear 
model. Therefore, once the data type has changed (such as binary or sur- 
vival responses), model-based approach does not necessarily imply balance. 
Finally, minimization approach is more popular in practice [Taves (2010)]. 
In fact, as discussed by many authors such as Lagakos and Pocock (1984), 
Smith (1984b) and McEntegart (2003), balanced allocation enhances cred- 
ibility of the trials for medical professions that are less statistically sophis- 
ticated, and the simple comparisons of similar groups of patients are often 
more acceptable than a model-based approach adjusting for covariates. 

In this paper we focus on the minimization approach that compares pa- 
tient numbers at different levels. While the marginal procedures have good 
balance with respect to the margins and the whole trial, their performance 
within the individual strata is not as satisfactory [Signorini et al. (1993), 
Kundt (2009)]. Wei (1978) gave a short proof that if no interaction exists, 
marginal balances are sufficient to ensure unbiased estimation of treatment 
effect in an unadjusted analysis. In other words, when interactions do ex- 
ist, ignorance of within-stratum imbalances may lead to biased estimation. 
Moreover, as the field of personalized medicine develops [Hu (2012)], sub- 
group analysis is often desired, and allocation balance within individual 
can improve the precision of such analysis. 

To overcome the potential drawbacks of stratification and Pocock and Si- 
mon's (1975) method, we develop a new randomization procedure in this pa- 
per, which considers a weighted average of three types of imbalances (within- 
stratum, within-covariate-margin and overall). By adopting Efron's (1971) 
discrete allocation function, the next patient will be assigned with higher 
probability to a treatment that leads to a smaller value of the weighted 
average. 

To study the theoretical properties of the new procedure, the main dif- 
ficulties include the correlation structure of within-stratum imbalances as 
well as the discreteness of the allocation function. In the literature, a large 
number of adaptive designs adopt a continuous allocation function, and their 
properties are often obtained by a Taylor expansion of the allocation func- 
tion, accompanied by a martingale approximation [Bai and Hu (1999), Hu 
and Zhang (2004), Zhang, Hu and Cheung (2006), etc.]. Since we use Efron's 
function, which is discrete at 0, the Taylor expansion is not feasible. We seek 
to take advantage of an alternative technique, namely "drift conditions," 
which was developed to study the stability of Markov chains on general 
state spaces. We show that the joint process of within-stratum imbalances 
under the new procedure is a positive recurrent Markov chain under some 
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conditions, and thus preserves the order of O p (l) at the within-stratum level. 
Our simulations suggest that the within-stratum imbalances under Pocock 
and Simon's (1975) design have fast-increasing variances as sample size in- 
creases, implying a slower rate than O p (l). 

In Section 2, the new procedure is described in general with / covariates. 
The theoretical results of the new procedure are given in Section 3. We 
further use simulations to study the different covariate-adaptive designs in 
Section 4 and conclude our paper with some observations in Section 5. The 
proofs of the theorems can be found in Section 6 and the supplemental article 
[Hu and Hu (2012)]. 

2. The new covariate-adaptive randomization procedure. This setting is 
similar to that of Pocock and Simon (1975), except that we only focus on 
two treatment groups, 1 and 2. Consider / covariates and levels for the 
ith covariate, resulting in m = Y\i=i m i strata. Let Tj be the assignment of 
the jth patient, j = 1, . .. , n, that is, Tj = 1 for treatment 1 and Tj = for 
treatment 2. Let Zj indicate the covariate profile of that patient, that is, 
Zj = (ki, . . . , ki) if his or her ith covariate is at level hi, 1 <i <I and 1 < 
ki <rrii. For convenience, we use (k±, . . . , ki) to denote the stratum formed 
by patients who possess the same covariate profile (k±, . . . , kj), and use (i; ki) 
to denote the margin formed by patients whose ith covariate is at level ki. 

The new procedure is defined as follows: 

(1) The first patient is assigned to treatment 1 with probability 1/2. 

(2) Suppose (n — 1) patients have been assigned to a treatment (n > 1) and 
the nth patient falls within stratum (k*, . . . , kj). 

(3) For the first (n — 1) patients: 

- let -D n _i be the difference between the numbers of patients in treat- 
ment group 1 and 2 as total, that is, the number in group 1 minus 
the number in group 2; 

- similarly, let _D n _i(z; k*) and D n -\{k\, . . . , kj) be the differences be- 
tween the numbers of patients in the two treatment groups on the 
margin (i; k*), and within the stratum (k*, . . . ,kj), respectively; 

- these differences can be positive, negative or zero, and each one is 
used to measure the imbalance at the corresponding level (overall, 
marginal, or within-stratum). 

(4) If the nth patient were assigned to treatment 1, then Dn^ = Z? n _i + 1 
would be the "potential" overall difference in the two groups; similarly, 

DW(i;k*) = D n _ 1 (i;k*) + l 

and 

D^(kl...,k}) = D n ^ 1 (k* 1 ,...,k}) + l 

would be the potential differences on margin (i; k*) and within stratum 
(kl, . . . , kj), respectively. 
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(5) Define an imbalance measure Imbn by 

Jm&W = w [DW} 2 +^2w m ,M\i; k*)f + w s [DW(kl, . . . , k})} 2 , 

i=l 

which is the weighted imbalance that would be caused if the nth patient 
were assigned to treatment 1. w Q , w m ^ and w s are nonnegative weights 
placed overall, within a covariate margin and within a stratum cell, 
respectively. Without loss of generality we can assume 

W + W s +^U> m ,i = 1. 
i=l 

(2) 

(6) In the same manner we can define Imb n , the weighted imbalance that 
would be caused if the nth patient were assigned to treatment 2. In this 
case, the three types of potential differences are the existing ones minus 
1, instead of plus 1. 

(7) Conditional on the assignments of the first (n — 1) patients as well as 
the covariates' profiles of the first n patients, assign the nth patient to 
treatment 1 with probability 



P{Tn — l|Z n ,T n . 



q, if Imbn > Imbn^ , 

p, if Imbn^ < Imbn , 



„ 0.5, otherwise, 

where n > 1, < q < p < 1, p + q = 1, Z n = {Z\,. . . , Z n ) and T n _i = 
(Ti, . . .,T re _i). 

Remark 2.1. When w a = w s = 0, that is, only the marginal imbalances 
are considered, the proposed design reduces to a special case of Pocock and 
Simon's (1975) method; and when w m ,i = w a = 0, it reduces to stratified 
randomization, where a separate biased coin is employed to determine the 
assignment within each stratum. However, we will explore procedures with 
other choices of weights, to see if they can lead to more balanced allocation 
from various perspectives. 

Remark 2.2. In the literature different views have been given as to the 
selection of the biasing probability p. Efron (1971) suggested p = 2/3, but 
his method does not consider covariates. The more recent papers, especially 
those involving covariate-adaptive randomization, suggested larger p's, such 
as 0.85, 0.90 and 0.95. See Weir and Lees (2003), Hagino et al. (2004), 
Toorawa et al. (2009), and Hu, Zhang and He (2009). One may also use other 
generators in step (7), for example, Wei's (1978) generator. The properties 
of the design will be different. 
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Table 1 

An example illustrating the calculation under the new procedure 





D 50 (-) 


£>&>(.) = D B0 (.) + 1 




Overall 





1 


-1 


Margin of male (1;1) 





1 


-1 


Margin of smokers (2;1) 


-1 





-2 


Stratum of male smoker (1,1) 


-2 


-1 


-3 



Example 1. Suppose in a trial two covariates, gender and smoking be- 
havior, are considered to be influential, each of which has two levels. Thus, 
the 4 strata (1, 1), (1, 2), (2, 1), (2, 2) represent male smokers, male nonsmok- 
ers, female smokers and female nonsmokers, respectively. Assume that the 
weights are w Q = 1/3, w m ^ = w m ^ = 1/6 and w s = 1/3. The first 50 patients 
have been randomized and the 4 within-stratum differences among these 50 
patients are —2, +2, +1 and —1. If the 51th patient is a male smoker, then 
the current imbalances are calculated as: 

- overall: £> n _i = -2 + 2 + 1-1=0; 

- margin of male: D n ^\(l; 1) = —2 + 2 = 0; 

- margin of smokers: -D n _i(2; 1) = —2 + 1 = —1; 

- stratum of male smokers: D n _i(l, 1) = — 2. 

The potential imbalances if the new patient were assigned to treatment 1 
or 2 are given in Table 1. 
Therefore, 

Imb£ = (l) 2 • i + (l) 2 • i + (0) 2 • | + (-1) 2 • | = 0.83, 

Imbfl = (-1) 2 • | + (-1) 2 • I + (-2) 2 • I + (-3) 2 • I = 4.17. 

since Imb^ = 0.83 < Imb$ = 4.17, the coin will be biased toward treatment 
1 with probability p > 0.5. 

3. Theoretical properties of the new design. We now investigate the 
asymptotic properties of the proposed design. For the first n patients, we 
know that D n (ki, . . . , ki) is the true difference of patient numbers within 
stratum (k\,. . . , kj). Furthermore, let 

Dn = [D n (ki, . . . , kl)\i<k 1 -< rnii ... t i<k I <rni 

be an array of dimension mi x • • • x m; which stores the current assign- 
ment differences in all strata. Also, assume that the covariates Z\,Zi % --- 
are independently and identically distributed. Since Z n = (ki , . . . , ki) can 
take m = Y\i=i m i different values, it in fact follows an m-dimension multi- 
nomial distribution with parameter p = (p(ki, ...,&/)), each element being 
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the probability that a patient falls within the corresponding stratum. Obvi- 
ously, p(ki,...,ki) >0 and £ fcl) .„ jfcj p(fci, ■ ■ • > k i) = L 

First, we notice that (D n ) n >i is a Markov chain on the space Z m . In fact, 
by definition of the new procedure, D n is a function / of (D n _i, Z n ,T n ). 
Moreover, conditional on D n _i, (Z n ,T n ) is independent of (Di, . . . ,D n _2); 
therefore, D n = /(D n _i, Z n , T n ) is also conditionally independent of (Di, . . . , 
D n _ 2 ). 

We next explore the conditions under which (D n ) n >i is a positive recur- 
rent chain, a desired property which indicates fast convergence rate. We will 
first investigate the special case of 2 x 2 strata, that is, only two covariates 
and two levels for each. This case enables us to obtain a finer result than the 
more general case, and at the same time also sheds light on how to set the 
conditions for the latter. With 2x2 strata, the weights on Imbn^ or Imbffl 
reduce to w a , w m> i, w m ^ and w s . 

Theorem 3.1. For the new design, consider 2 covariates and 2 levels for 
each. w , w m ^, w m ^ andw s are nonnegative with w +w mj i-\-w m ^+w s = 1. 
If the following two conditions hold: 

(A) w s >0, 

(B) define 

ui=w + w m ,i + w ra ,2 + w s = 1, 
u 2 =w + w m>1 , 

U 3 =W + W m)2 , 

n 4 = w a ; 

the solution x = (x\,X2,xs) to the linear equation 
'ui 
u 2 

, ii 3 n 4 ui , 
satisfies \x±\ + ja^l + |^3| < 1 ; 

then (D n ) n >i is a positive recurrent Markov chain with period 2 on Z . 

Remark 3.1. By Theorem 3.1 the chains D2 n .+i and T>2n are two er- 
godic chains and converge to two limit distributions, respectively. Thus, 
D2n+i = O p (l) and T>2n = O p (l), which implies that D n = O p (l). Accord- 
ingly, the imbalances at any level (within strata, on the margins, or overall) 
preserve the order of O p (l). 

Remark 3.2. In fact, U\, U2, u% and U4 in the above theorem can be 
interpreted as the weights placed on individual strata: call the stratum in 
which the current patient falls a "target," then u\ is the weight on the 
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Table 2 

Constraint on w m is a function of w 



w 0.00 0.20 0.40 0.60 0.80 

C(w ) 0.31 0.23 0.17 0.11 0.05 



target itself; U2 (u^) on any stratum that is at the same level of covariate 1 
(covariate 2) as the target; U4 on any of the remaining strata. 

Corollary 3.1. In Theorem 3.1, if we further assume that w m> \ = 
w m ,2 '■= Wm, then condition (B) is equivalent to 



(B ) w m < C[w ) := 



Remark 3.3. See Table 2 for certain values of C(w ). Since C(w ) is 
a decreasing and almost linear function of w on [0,1], condition (B ; ) is 
much easier to verify than condition (B). For example, if the weight at the 
overall level w Q = 0.20, then the ones on the two margins need to be less than 
0.23. Therefore, (w ,w mt i,uu m! 2,w s ) = (0.20,0.22,0.22,0.36) is a legitimate 
weight set that ensures positive recurrence. 

The next theorem deals with the general case of m = Y\i=i m i strata. 
Using the basic equation (a; + l) 2 — (x — l) 2 = Ax, the critical quantity 

Imbn^ — Imbffl in step (7) (Section 2) can be simplified as 
Imb^ - Imb® 

(3.1) =4<^ w D n ^x + y^w m .jDn-i(i;k*) + w s D n -i(kl, ...,kj) 

{ i=i 

:=4-5 n _i(^V..,^). 

Therefore, the biasing probability p, q or 1/2 is determined by the sign of 
5 n ~i{k\, . . . , kj), which is a weighted average of current imbalances at differ- 
ent levels. Since -D n _i and D n -\(i\k*) can both be expressed as a sum 
of certain D n —i(k\, . ■ . , &r)'s, we want to reformulate S n -i(k*, . . . , kj) as 
a weighted average of imbalances within the individual strata. 

As a motivating example, consider 3 covariates, gender (male or female), 
smoking behavior (smoker or nonsmoker) and clinical center (3 centers), with 
a total of 12 strata. Suppose for the new patient Z n = (1,1,1), that is, he 
falls into the stratum of "male smokers at center 1." Then for the remaining 
strata, the weights on D n -x(k\, . . . , fe/)'s in the expression of 5 n -\{k\, . . . , kj) 
are shown in Table 3. 

Generally, with respect to stratum (fc|, . . . , kj) in which the new patient 
falls, we will divide the m = Y\\=\ m i strata into several categories and find 
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Table 3 

An example showing the weights of D n -i(ki, . . . ,ki)'s in 5«-i(fc*, . . . , k}) 





Stratum 


Description 


Weight 


1 


(1.1.1) 


male smokers at center 1 


W + WmA + Wm.,2 + W m ,3 + W s = 1 


2 


(1,1,2) 


male smokers at center 2 


W + W m ,l + W m ,2 


3 


(1,1,3) 


male smokers at center 3 


W + W m ,l + W m .2 


4 


(1,2,1) 


male nonsmokers at center 1 


W a + W m ,l +W m , 3 


5 


(2,1,1) 


female smokers at center 1 


W + W m ,2 + W m , 3 


6 


(1,2,2) 


male nonsmokers at center 2 


W + W m ,l 


7 


(1,2,3) 


male nonsmokers at center 3 


W + W m ,l 


8 


(2,1,2) 


female smokers at center 2 


W + W m ,2 


9 


(2,1,3) 


female smokers at center 3 


W + W m ,2 


10 


(2,2,1) 


female nonsmokers at center 1 


W + Wm tS 


11 


(2,2,2) 


female nonsmokers at center 2 


W 


12 


(2,2,3) 


female nonsmokers at center 3 


W 



out the corresponding weights in the expression of 5 n -\{k\, . . . , kj). Let I = 
{1,2,..., /}. For any stratum (k±, . . . , ki): 

- if {k\, . . . , kj) = (k*, . . . , kj), then the weight on L> n -i(^i> • • • , ki) is w Q + 

Y^i=l w m,i + W s = 1; 

- for any fixed i (i € I), if ki 7^ k* and kj = k* for j £ I and j 7^ i, then 
the weight on D n ^i{k\ , . . . ,kr) is w a + J2j^i w m,j , and there are (mj — 1) 
strata in this category; 

- for any fixed i\ < 12 ({ii,^} Cl), if k^ 7^ k* x , ki 2 7^ k* 2 , and kj = kj for 
j € I, j / i\ and j 7^ 12, then the weight on D ri ^i{k\, . . . ,ki) is w Q + 
Ylj^ j^i 2 w m,j, and there are (m^ — l)(mi 2 — 1) strata in this category; 

- for any fixed i± < ii < ■ ■ ■ < i\ ({h, ■ ■ ■ ,ii} C I), if ki t 7^ k* t and kj = k* 
for j € I, j 7^ it and 1 < t < I, then the weight on D n _i(k\, . . . ,ki) is 
w + J2j^i t kki w m,j , and there are n'=i ( m k ~ 1) strata in this category; 

- if ki 7^ k* for all i El, then the weight on D n -\(ki, . . . , kj) is w , and there 
are rL=i( m « ~~ 1) strata in this category. 

It is easily verified that 

1 

m = vni 
i=i 

= [(mi-l) + l][(m 2 -l) + l]- 

1=1 l<i 1 <i 2 <-<ii<It=l 



■■[(m/-l) + l] 
-1], 
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which is consistent with the counts listed above. Our general theorem in the 
following is closely related to the above weights and counts. 

Theorem 3.2. For the new design, consider I covariates and rrii levels 
for the ith covariate, where I > 1, 1 <i < I , and > 1. w , w s and w m ^, 
i = 1, . . . , I , are nonnegative with w a + Yll=i w m,i +w 3 = l. If 

(C) u*:=E E l( w o+ E ^)llK-l])<V2, 

1=1 l<i 1 <i 2 < - <ii<I I ^ jj^h,l<t<l t=l J 

then D n is a positive recurrent Markov chain on Z m . 

To see the theorem in a more intuitive way, we will take a closer look at u* 
in the special case of two covariates, as is shown in the following corollary. 

Corollary 3.2. In Theorem 3.2, if 1 = 2, then condition (C) is equiv- 
alent to 

(C) (mim 2 - l)w + (mi - l)w m ^ + (m 2 - l)l» m ,l < 1/2- 

Remark 3.4. When w Q = and w mt i = w m ^ = w m , condition (C) fur- 
ther reduces to w m < [2(mi + m,2 — 2)] _1 . For example, if mi = m 2 = 5, then 
w m < 1/16 is required to satisfy condition (C). 

Remark 3.5. In both Theorems 3.1 and 3.2, w s > is required. There- 
fore, the theoretical results in these theorems do not apply to Pocock and 
Simon's (1975) design (with w s = 0). The simulation result in Table 4 (Sec- 
tion 4) shows that the within-stratum imbalances under their method in- 
crease as the sample size increases, suggesting that they may not have the 
rate of O p (l). We hypothesize that the condition w s > is critical to ensure 
that (D n ) n >i is positive recurrent. These are further research problems. 

To prove the above two theorems, we will use the technique of "drift 
conditions" [Meyn and Tweedie (1993)], which was developed for Markov 
chains on general state spaces. Applying their theory to our problem, in 
order to prove positive recurrence of (D n ) n >i we need to find a test function 
V :7i m — 7>M + , a bounded test set C on Z m and two positive constants M± 
and M 2 such that 

(3.2) AV(D):= E P(D,D')V(D') - V(D) 

D'ez m 

satisfies the following two conditions: 

(3.3) AV(D)<-Mi, D^C, 

(3.4) AF(D)<M 2 , DeC, 
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where P(D, D') is the transition probability from D to D' on state space Z m 
of the chain (D n ) n >i. V is often a norm-like function on Z m . These drift 
conditions can roughly be interpreted as follows: so long as the average one- 
step movement AV tends to go back (with the magnitude uniformly greater 
than a positive constant Mi), that is, the chain is pulled back toward the 
finite set C, positive recurrence can be ensured. For proofs of the theorems, 
see Section 6 and the supplemental article [Hu and Hu (2012)]. 

4. Simulation studies. We will compare the new procedure with strat- 
ified permuted block randomization and Pocock and Simon's (1975) mini- 
mization method. The simulations can be divided into three parts. First, we 
will simulate the case of 2 x 2 strata with a relatively large number of pa- 
tients, to verify the convergence rate as stated in Theorem 3.1. Secondly, we 
are interested in the performances of different randomization methods when 
the number of strata is large as compared to the sample size. An example of 
500 patients and 10 covariates (each with 2 levels) will be studied. Finally, 
an example from Toorawa et al. (2009) will be considered, which is chosen 
because it resembles real situations in clinical trials. 

4.1. 2x2 strata. For the three randomization procedures, we want to see 
whether the imbalances at any of the three levels (within-stratum, marginal 
and overall) stabilize, which indicates the rate of O p (l) at that specific level. 
The parameters are specified as follows: 

- Multinomial probability (p(l,l),p(l,2),p(2,l),p(2,2)) = (0.1,0.2,0.3,0.4). 

- Biasing probability p = 0.85 and q = 0.15 for Pocock and Simon's method 
(PS) as well as for the new procedure (NEW). 

- Block size 4 for stratified randomization (STR-PB). 

- Sample size n = 200, 500, 1000; number of simulated trials N = 1000. 

- NEW: (w , w m ,i, w mt 2, w s ) = (0.3,0.1,0.1,0.5); conditions (A) and (B) are 
satisfied. 

- PS: (w ,w mi i,w mi 2,w s ) = (0,0.5,0.5,0); Conditions (A) and (B) are NOT 
satisfied. 

Table 4 shows the standard deviations (std's) of D n (-)'s at different levels 
(by symmetry of the designs, the theoretical mean of each D n (-) is always 0). 
For simplicity, only the result of 2 strata and 2 margins are listed. Of the 
five columns, the first and the second give the std's of assignment differences 
within stratum (1,1) and (2,2); the third and fourth for the marginal dif- 
ferences of covariate 1 at level 1 and covariate 2 at level 2; and the last for 
the overall difference. 

Table 4 suggests that all 5 standard deviations stabilize under NEW and 
STR-PB when the sample size increases. For example, under NEW the std's 
of D n (l,l) are 1.11, 1.14 and 1.03; and those of D n are 1.32, 1.22 and 
1.27, which means that our new procedure preserves the rate of O p (l). The 
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Table 4 

std's of D n (-) of several methods under different sample sizes 



Sample size 




£>tl(1,1) 


D n {2,2) 


D n (l;l) 


D n {2;2) 


D n 


STR-PB 


200 


0.92 


0.89 


1.30 


1.27 


1.83 




500 


0.92 


0.92 


1.31 


1.30 


1.86 




1000 


0.92 


0.89 


1.31 


1.28 


1.81 


PS 


200 


3.16 


3.27 


1.15 


1.13 


1.30 




500 


4.80 


4.83 


1.16 


1.11 


1.31 




1000 


7.25 


7.33 


1.15 


1.13 


1.30 


NEW 


200 


1.11 


1.07 


1.30 


1.27 


1.32 




500 


1.14 


1.10 


1.33 


1.28 


1.22 




1000 


1.03 


1.10 


1.20 


1.24 


1.27 



same conclusion can be reached for STR-PB. In fact, since the block size 
is 4, any within-stratum imbalance under STR-PB is bounded by 2. For PS, 
however, while the std's of marginal and overall differences stabilize, those 
of the within-stratum differences do not. For example, the std of D n (l, 1) 
increases from 3.16 to 4.80 and 7.25, much larger than those under the other 
two methods. 

For the within-stratum imbalances, STR-PB is the best [0.92 for D n (l, 1)], 
with NEW having slightly larger std's and PS the largest. For the marginal 
imbalances, PS is the best [around 1.15 for D n (l; 1)], and the other two are 
about the same [around 1.30 for D n (l; 1)]. For the overall imbalance, STR- 
PB is not as good as NEW and PS. Therefore, we see that even for 4 strata, 
STR-PB does not perform well for the overall imbalance. 

4.2. 2 10 strata. We simulate a hypothetical trial, which involves 500 pa- 
tients, 10 covariates and 2 levels for each, that is, 1024 strata in total. The 
biased coin probabilities p and q for NEW and PS, the block size for STR- 
PB and the number of simulated trials N remain the same. The covariates 
are generated as follows: in addition to the independence assumption of co- 
variates between patients, we further assume that within each patient the 
different covariates are independent and that each level within a fixed co- 
variate is equally likely. Therefore, for the covariate profile Zi = (k\, . . . , ki) 
of the ith patient, k\,. . . ,kj are independently sampled from {1, 2}. For the 
weights, we use w Q = 0, w s = 0.5 and w m ^ = 0.5/10. 

Of the 1024 strata, on average 61.4% have no patient, and only 0.1% have 
4 or more. Hence, if STR-PB is employed, most blocks are incomplete, which 
tends to cause large overall imbalance. Table 5 displays the mean absolute 
imbalances under each of the three randomization methods. 

As seen in Table 5, STR-PB has an extremely large E|Z) n | (17.07). In 
comparison, the other two methods have much smaller values of 0.76 and 
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Table 5 

Mean \D n (-)\ for 2 10 strata and 500 patients 







STR-PB 


PS 


NEW 


Overall 




17.07 


0.76 


0.98 


Marginal 




11.80 


1.65 


1.94 


Within-strt. 


(2 pts) 


0.66 


0.98 


0.50 


Within-strt. 


(3 pts) 


1.00 


1.23 


1.08 



0.98. So in this respect, PS has the best performance, and NEW is only 
slightly worse. In the second row, the mean absolute marginal imbalance is 
the average of the absolute differences over 20 margins as well as over the 
1000 simulations, and the interpretation is the same as the overall imbalance. 
For the within-stratum imbalances, the table shows the result for strata with 
2 or 3 patients. For example, under PS, 0.98 is the mean absolute difference 
over all strata with 2 patients as well as over the 1000 simulations. Under 
this criterion, PS is not recommended since the two means are 0.98 and 1.23, 
the largest among the three methods. STR-PB and NEW are quite similar, 
with means 0.66 versus 0.50 for strata with 2 patients, and 1.00 versus 1.08 
with 3 patients. Hence, although our new procedure is not always the best, 
it ensures that no single type of the imbalances becomes too extreme. 

4.3. An example mimicking real clinical data. We chose an example from 
Toorawa et al. (2009). The four covariates are site, gender, age and disease 
status, with 20, 2, 2 and 2 levels, respectively, resulting in 160 strata. The 
covariates' distribution is replicated in Table 6, where the marginal distri- 
bution of sites is independent of the joint distribution of the remaining three 
covariates. 



Table 6 
Distribution of covariates 



Sites 


Small (2 sites) 


1/120 




Medium (16 sites) 


6/120 




Large (2 sites) 


11/120 


Other 3 covariates 


Male; < 60; Moderate disease 


10/20 




Male; > 60; Moderate disease 


2/20 




Male; < 60; Severe disease 


2/20 




Male; > 60; Severe disease 


2/20 




Female; < 60; Moderate disease 


1/20 




Female; > 60; Moderate disease 


1/20 




Female; < 60; Severe disease 


1/20 




Female; > 60; Severe disease 


1/20 



14 



Y. HU AND F. HU 



Table 7 

Distribution of patients among 160 strata 



# of pts within stratum 





1 


2 


3 


4 and more 


# of strata 


95.4 


38.8 


12.7 


5.6 


7.6 


Proportion 


59.6% 


24.3% 


7.9% 


3.5% 


4.7% 



120 patients enter the trial sequentially, and their covariates are indepen- 
dently simulated from the multinomial distribution in Table 6. We use the 
same p, q and block size as in the previous two examples. The weights are 
specified in the following way: 

- NEW: w = w s = 1/3 and w m>i = 1/12, i = 1, . . . ,4. 

- PS: w = w s = and w m ^ = 1/4, i = 1, ... ,4. 

Table 7 shows the distribution of 120 patients among 160 strata. In this 
case 24.3% of the strata have 1 patient; 11.4% contain 2 or 3 patients. If 
stratified randomization is employed, then the patients in the above 24.3% 
strata has to be randomized by equal probabilities. Moreover, the incomplete 
blocks in strata with 2 or 3 patients also pose a high risk of large overall 
imbalance. 

The mean absolute imbalances at the three levels are compared, as shown 
in Tables 8, 9 and 10. Table 8 shows the result for the overall imbalance 
and lists the mean, median and 95% quantile of |-Di2o|- It is seen that NEW 
has mean, median and 95% quantile of 0.63, and 2, respectively, whereas 
PS has slightly higher values. The three quantities are extremely high under 
STR-PB, which are not recommended for this case. 

Table 9 gives the mean absolute marginal imbalances. For the covariates 
of gender, age and disease, the table explicitly lists the mean values on 
these 6 margins, as each of them only has two levels. For example, over 
the 1000 simulations, on average the absolute differences of patients in the 
two treatment groups within all male are 5.52, 1.10 and 1.59 under STR- 
PB, PS and NEW, respectively. Therefore, in this respect PS has the best 
performance; NEW is slightly worse, but still tolerable; STR-PB is the worst, 
since its mean is as high as 5.52. Similar conclusions can be reached for the 





Table 8 






Comparison 


of absolute overall 


imbalance 


|A»| 




STR-PB 


PS 


NEW 


Mean 


6.70 


0.91 


0.63 


Median 


6 








95% quan 


16 


2 


2 
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Table 9 

Comparison of mean absolute marginal imbalances E\D n (i;ki)\ 







STR-PB 


PS 


NEW 


Gender 


male 


5.52 


1.10 


1.59 




female 


3.86 


1.06 


1.55 


Age 


<60 


4.84 


1.08 


1.57 




>60 


4.40 


1.11 


1.23 


Disease 


moderate 


5.01 


1.10 


1.56 




severe 


4.35 


1.18 


1.52 


20 sites 


2 small 


1.45 


0.94 


1.02 




16 median 


1.44 


1.21 


1.32 




2 large 


1.47 


1.33 


1.52 



other 5 margins. Moreover, for the margins relating to "site," since there 
are a total of 20 margins, we are unable to show the result on each margin 
due to the space limit. Hence, these 20 margins are further categorized into 
three groups of small, median and large sizes, and the mean values in the 
table are further averaged over the margins within the groups. For example, 
1.32 is the mean absolute imbalance over the 16 median-sized sites as well as 
over the 1000 simulations. In terms of imbalances on margins defined by site, 
PS is still the best, and STR-PB has similar performance to NEW. This is 
because each margin of site contains only 8 strata, hence the "accumulating 
effect" of within-stratum imbalances under STR-PB is not as strong. 

Table 10 displays the distribution and absolute mean of within-stratum 
imbalances for strata with 2 or 3 patients. For example, of all the strata 
which contain 2 patients, the absolute difference is either or 2, and the 
distribution is 0.69 to and 0.31 to 2 under NEW, leading to an average 
of 0.62. According to this criterion, NEW has the lowest mean, STR-PB 
has a slightly larger value and PS has mean as large as 0.86. For strata 
containing 3 patients, since the block size is 4 for STR-PB, it is impossible 



Table 10 

Comparison of absolute within-stratum imbalances \D n (ki, . . . , ki)\ : Distribution and mean 



# of pts' within strt. 


|X>«(fei,...,fcj)l 


STR-PB 


PS 


NEW 


2 


prob(= 0) 


0.68 


0.57 


0.69 




prob(= 2) 


0.32 


0.43 


0.31 




mean 


0.64 


0.86 


0.62 


3 


prob(= 1) 


1.00 


0.85 


0.94 




prob(= 3) 


0.00 


0.15 


0.06 




mean 


1.00 


1.30 


1.12 



16 



Y. HU AND F. HU 



to get an absolute value of 3. Hence, the mean absolute imbalance is 1, the 
minimum among the three methods. 

In summary, our new method maintains good balance from all three per- 
spectives and should be favored. We also performed the simulations under 
other parameter values. Some of them include: (1) changing the weights w a , 
w s , and Wm t i, as well as the block size; (2) 2 x 100 strata, representing few 
covariates but many levels at least for one covariate; (3) 3x4x5x6 strata, 
representing a few covariates and a few levels for each. In all the above 
settings, our new procedure shows advantages over the other two methods. 

5. Conclusion. In this paper we propose a new covariate-adaptive design 
that minimizes a weighted average of three types of imbalances (within- 
stratum, within-covariate-margin and overall). Simulation results show that 
the proposed method provides better allocation balance from different per- 
spectives, while stratified randomization and Pocock and Simon's (1975) 
marginal method have large imbalances either as a whole, or within-stratum. 

The new procedure can also be generalized in several ways. In this pa- 
per we only considered balanced allocation (1:1), whereas in some problems 
unequal ratios [Hu and Rosenberger (2006)] are also desired. For example, 
if the two groups are an innovation versus a placebo, and a pilot study has 
shown some effect of the innovation, then it is more ethical to assign more 
patients to the innovation. If one treatment is much more costly than the 
other, then assigning more patients to the latter would be more economi- 
cal. Sometimes, the randomization has to be adapted to covariates as well 
as responses. Zhang et al. (2007) proposed "covariate-adjusted response- 
adaptive randomization," whose allocation ratio depends on both covariate 
profiles and responses of patients. One may modify our proposed procedure 
to accommodate these situations. On the other hand, some trials (e.g., some 
Phase II trials) involve the comparison of more than two treatments [Pocock 
and Simon (1975), Hu and Rosenberger (2006), etc.]. We can generalize the 
proposed procedure to clinical trials for comparing three or more treatments. 
We leave these as future research topics. 

For Efron's (1971) biased coin design (without involving covariates), it 
is well known that the imbalance is a positive recurrent Markov chain. 
Markaryan and Rosenberger (2010) studied some exact properties of Efron's 
(1971) biased coin design. However, to our best knowledge, there is no the- 
oretical result about the imbalance of covariate-adaptive randomization in 
literature, due to the complex of the problem and the lack of technical tools. 
In this paper, we introduced the technique of "drift conditions" in Markov 
chains to study the theoretical properties of covariate-adaptive randomiza- 
tion. This technique could provide a possible way of studying the properties 
of general covariate-adaptive designs as well as covariate-adjusted response- 
adaptive designs. 
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The inference under covariate-adaptive randomization is also an impor- 
tant issue. By simulation studies, several authors have raised concerns about 
the conservativeness of the unadjusted analysis (such as two-sample i-test) 
under covariate-adaptive randomization and suggested that all covariates 
that are used in the randomization should be included in the analysis [Bir- 
kett (1985), Forsythe (1987), etc.]. Shao, Yu and Zhong (2010) studied the 
theoretical relationship between different randomization designs and differ- 
ent inference methods. To make the problem more tractable, the authors 
focused on a simple homogeneous linear model. They found that if the un- 
derlying response-covariate model can be correctly specified, then the usual 
regression analysis is valid and has the highest power as compared to other 
types of analysis, no matter what randomization is employed. These results 
also apply to the proposed randomization procedure in this paper. 

If the model specification is not feasible and only a two-sample t-test 
can be used, then the test under stratified randomization tends to have 
a conservative type I error rate due to the overestimation of Var(Yi — Y 2 ). 
Shao, Yu and Zhong (2010) used a bootstrap method to correct the variance 
estimation. The resulting bootstrap i-test restores the type I error rate, and 
is more powerful than the traditional t-test under simple randomization. 
Similar bootstrap adjustment can be used as an inference method for the 
new randomization procedure. We leave this as a future research project. 

6. Sketch of proofs. 

Proof of Theorem 3.1. With 2x2 strata, the within-stratum imbal- 
ances D n and the multinomial probabilities p are both matrices of 2 x 2. Let 
D n = (D n}1 ,D n , 2 ,D nj3 ,D nA ) := (D n (l,l),D n (l,2),D n (2,l),D n (2,2)), that 
is, D n is simply the vector form of D n . p = (pi, . . . ,p^) can be defined in the 
same way. By the above notation, any stratum can be represented by the 
2-index form (ki, k 2 ), or the single- index form (r) (1 < r < 4). The quantity 
8 n -i(ki,k 2 ) m (3-1) then reduces to 

Sn-l(k*i,k*) 

= (w + w m i +w m 2 + Wo)D n -i(kl,kl) + (w + w m> i)D n -i(k*,k2) 

(6.1) 

+ (w + w m>2 )D n -i(ki, k 2 ) + w D n -i(ki,k 2 ) 

= uiD n ^i (k\ , k\ ) + u 2 £>„-i (K , k 2 ) + n 3 D n _i (ki , k 2 ) + u 4 £> n _i (h , k 2 ), 

where h / k\, k 2 ^ k\ and ui = 1. Let 5 n = (S n> i,5 nt2 ,S n)3 ,Sn,4) ■= (<S n (l, 1), 
<5 n (l,2), 8 n (2, 1), 8 n (2, 2)). Then, according to (6.1), D n and 5 n are linked by 

/ Ml u 2 u 3 m \ 

U 2 Ul 1i4 u 3 

U 3 Ui Ui u 2 

\U4 U 3 U 2 Ui J 



3.2) ~S n = D ? 



D n U. 
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For any D n € Z 4 , we define a test function 

V<D n ) =£ [ °*£, 

i Pr 
r=l 

that is, the sum of squared within-stratum differences adjusted for the 
corresponding multinomial probabilities. The test set C is defined as C = 
{D n : max r ||-D n! r || < K} [K > is to be determined). V and C are the key el- 
ements in proving positive recurrence, according to the drift conditions (3.3) 
and (3.4). 

For the ease of representation, in the rest of the proof we will simply use 
the notation D and 8 for D„ and S n , respectively, unless specified otherwise. 
Under the new allocation rule, it can be derived that the one-step movement 
AV(D), defined in (3.2), has the form 

4 

AV(D) = 2(q -p)J2 sgn(5 r )Z? r + 4, 

r=l 

where D r and 5 r are the rth element of vectors D and S, respectively, 
and sgn(rr) = 1, —1, for x > 0, < or = 0. For derivation of AV(D), see 
Section 1 of the supplemental article [Hu and Hu (2012)]. 

We need to show that AV(D) satisfies drift conditions (3.3) and (3.4). In 
fact, since the test set C is bounded, (3.4) is trivially true. Since q — p < 0, 
(3.3) is equivalent to finding M[ > 2/{p — q) such that 

4 

(6.3) AW(D):=J2 s &^r)D r >M[ for D £ C. 

r=l 

Intuitively, when u 2 , U3 and U4 are small, 5 r is expected to be close to D r 
so that they have the same sign. Thus, a larger proportion of the strata 
have sgn(5 r )D r = sga(D r )D r = \D r \ and AW(D) tends to be positive. In 
the trivial case that 112 = 113 = 114 = 0, that is, D = S, we have AVF(D) = 
Er=i IA-1 > K, so (6.3) holds by letting M[ = K = 2.1/(p - q). Therefore, 
in the following we can assume that max{«2, M3, U4} > 0. 

For any D £ C c C Z 4 , call the pair of (D r ,5 r ) a "match" if 5 r 7^ and 
5 r D r > 0. Hence, for a match sgn(<5 r )D r = \D r \. Furthermore, define M(D, 8) 
as the number of matches in (D r , <5 r )'s, r = 1, . . . , 4. Obviously, < M(D, S) < 
4. It can be shown that M(D, S) = is impossible for D £ C c . Therefore, for 
M(D,S) =i, i = 1,2,3,4, if we can find di > 0, such that AW(D) > diK, 
then (6.3) is true by letting M[ = i'Tminj di and K = 2.\/[{p — q) minj dj\. 

In fact, finding d^ for M(D, S) = 4 is quite trivial (c?4 = 1). We will show 
how to find c?3 for M(D, S) = 3 below. When M(D, 6) = 3, we know that a\ = 
max{n2, U3, Ui\ 7^ and = min{l — U2, 1 — U3, 1 — U4} 7^ (since w s 7^ 0). 
Without loss of generality assume D\ and 5\ do not match, which means 
SiD x <0. Thus \5!-Di \ >\Di\. By (6.2), 8t - D x = u 2 D 2 + u 3 D 3 + u A D 4 , 
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which implies \u 2 D 2 + u 3 D 3 + u^D^l > \D\\. Then 
AW(D) > -\Dx\ + \D 2 \ + \D 3 \ + \D 4 \ 

> -{u 2 \D 2 \ + u 3 \D 3 \ + U4|D 4 |) + \D 2 \ + \D 3 \ + |£> 4 | 
>a 2 {\D 2 \ + \D 3 \ + \D 4 \) 

> a 2 [(l/2)(|D 2 | + \D 3 \ + \D A \) + (l/2)or ^Dil] 

> (^^mh^l,^ 1 } • max{|Di|, |Z> 2 |, |A|} 

> (02/2) min{l, a^ 1 } • K := d 3 K 

The ways of finding d 2 and d\ for M(D, 5) = 2 and 1 are similar, but require 
more work. In particular, condition (B) in Theorem (3.1) is needed to verify 
the case of M(D, 8) = 1. In Section 2 of the supplemental article [Hu and Hu 
(2012)], we show how to find di for i = 4, 3,2, 1 and explain why M(D, S) 7^ 0. 

Corollary 3.1 is obtained by solving the linear equation for x in Theo- 
rem 3.1 under the assumption that w m> \ = w m ^ 2 and then substituting the 
solution in \x±\ + \x 2 \ + \x 3 \ < 1. For detailed proof of Corollary 3.1, see 
Section 3 of the supplemental article [Hu and Hu (2012)]. □ 

Proof of Theorem 3.2. The main steps are similar to those in Theo- 
rem 3.1. Let D n = (D n ,i, . . . , D n ,m) be the vector version of D n = {D n {k\, . . . , 
ki)), where the m strata can be arbitrarily ordered and indexed by 1, . . . , m. 
Similary, let 8 n and p be the vector forms of array (5 n (ki, . . . ,kj)) and array 
(p(ki, . . . , kj)), respectively, using the same order as in (D n (l), . . . , D n (m)). 
By the above notation, any stratum can be represented by the /-index form 
(ki, . . . , ki), or the single-index form (r) (1 < r < m). As in the 2x2 case, 
let S n := D n U. Then by the definition of S n as well as the description 
of weights before Theorem 3.2, for any two strata (r) = (k*,...,kj) and 
(s) = (ki, . . . , ki), the element u rs in the matrix of U is determined as fol- 
lows: for any fixed i\ < i 2 < ■ ■ • < i\ ({ii, ...,£/}€ I), if h t ^ k* t and kj = k* 
for j £ I, j 7^ it and 1 < t < I, then 

U rs =W + ^ W m,j- 
j&t,l<t<l 

So u rs = u sr , and for any r, ^2 S=1 m s ^ r u rs = u*, as defined in Theorem 3.2. 

The test function V and the test set C are still defined as before, except 
that in this case the dimension of D n is m instead of 4. Use the simple 
notation D and d for D n and d n , respectively. In the same manner, to 
verify the drift conditions it is equivalent to find M[ > 2/(p — q) such that 

m 

(6.4) AW(D) := sga(S r )D r > M[ for D ^ C. 

r=l 

For any fixed D 6 C c C Z m , suppose for (D r ,6 r ys, r = 1, . . . ,m, there are mo 
mismatched pairs. Without loss of generality assume that the mismatched 
pairs occur in the 1st, 2nd, . . . and the moth strata. By the definition of 
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a mismatched pair, D r and 6 r = Es=i u rsD s + D r + Yl^Li+i u rs D s have 
different signs, r = 1, . . . , mo- Therefore, 

r— 1 m 
| A- 1 < ^2u rs D s + ^ u rsD s 

s=l s=Z+l 

6.5 

< ^ it rs [.D s | + ^ ii rs |-D s |. 

s=l,...,mo,sj^r s=mo+l 

First, we notice that mo ^ m; otherwise, by summing (6.5) over r = 1 to m, 
we have Y^r=i I A I < E^Li IA| which is impossible for D € C c and u* < 
1/2. Second, suppose mo 7^0. By summing (6.5) over r = l to r = m,Q, we have 

mo / \ m /mo \ 

1 - X! "rs)|A|< X] ( ^11™ J jAj- 
r=l ^ s=l,...,mo,s^r r=n»o+l \s=l / 

Combined with the fact that 1 — u* < (1 — Yl s =i m Q k^r u rs) for r = 1, . . . , mo 
and (YlT=i u rs) — u * f° r r = m o + 1, • • • , m, it is seen that 

mo m 

£(1-U*)|A-|< U *\ D rl 
r=l r=mo+l 

Then 

AW(D) > -|Di| - \D 2 \ lAnol + |An +l| + |An 0+2 | + • ■ • + |An| 

>-~ E + E w-(i-t^f) E iai. 

r=mo+l r=mo+l r=mo+l 

Since < u* < 1/2, we have < 1 — jr— r < 1. Hence, the above inequal- 
ity is also true for mo = 0. If max mo+ i< r < m \D r \ > K, then AVT(D) > (1 — 
i- u * )K'-> otherwise maxi< r < mo 

|A| > K~ a nd £™ mo+1 IA| > ^Er=°i IA|, 

which means AW(D) > (1 - j^r)^£K > (1 - jr^)K. Thus, if we define 
Ml, = (1 — ^r L r)K and K = -^-(1 - -r^) -1 , then 

AW(p)>M' 2 >2/{p-q). □ 
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a much improved version of the paper. 

SUPPLEMENTARY MATERIAL 

Additional proofs (DOI: 10.1214/12-AOS983SUPP; .pdf). We provide ad- 
ditional proofs that are omitted in Section 6. They include: (1) derivation 
of AV(D); (2) the appropriate choice of di when M(D, 8) = i {i = 4, 3, 2, 1); 
(3) proof of Corollary 3.1. 
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